Multiple Testing

Advance Analytics with R (UG 21-24)

Ayush Patel

Before we start

Please load the following packages

library(tidyverse)
library(MASS)
library(ISLR)
library(ISLR2)
library(infer)### get this if you don't



Access lecture slide from bit.ly/aar-ug

Warrior's armor(gusoku)
Source: Armor (Gusoku)

Hello

I am Ayush.

I am a researcher working at the intersection of data, law, development and economics.

I teach Data Science using R at Gokhale Institute of Politics and Economics

I am a RStudio (Posit) certified tidyverse Instructor.

I am a Researcher at Oxford Poverty and Human development Initiative (OPHI), at the University of Oxford.

Reach me

ayush.ap58@gmail.com

ayush.patel@gipe.ac.in

Motivation for Tests for Significance

  • Helps us answer - Is it just chance?
  • Understand and quantify uncertainty and variability.

Helps us make and verify claims about a population using sample estimates.

Helps us make and verify causal claims.

The numbered tickets example

refer: Statistics by David Freedman

Example

Write a simulation that will implement the previous example.

The sex discrimination case study

refer: Intro to Modern Statics

A little about simulation and {infer}

Base

get_diffs <- function(...){
  
  decisions <- ifelse(
  openintro::sex_discrimination$decision == "promoted",
  1,0
)

sample(decisions,
       size = 48,
       replace = F) -> shuffled_decision

mean(shuffled_decision[1:24]) - mean(shuffled_decision[24:48])
  
}

map_dbl(c(1:100), 
        get_diffs)
  [1]  0.11166667  0.03000000 -0.13333333  0.07000000 -0.21500000 -0.05166667
  [7]  0.11166667 -0.05166667  0.11166667 -0.05166667  0.03000000  0.03000000
 [13] -0.05166667 -0.01166667  0.03000000  0.11166667 -0.09333333  0.11166667
 [19] -0.17500000 -0.05166667 -0.05166667  0.11166667 -0.05166667  0.03000000
 [25]  0.31500000 -0.21500000  0.19333333 -0.05166667 -0.13333333 -0.01166667
 [31] -0.13333333 -0.13333333 -0.09333333  0.11166667  0.03000000 -0.09333333
 [37] -0.01166667 -0.13333333 -0.21500000  0.03000000  0.19333333 -0.13333333
 [43]  0.03000000 -0.05166667  0.03000000 -0.13333333 -0.21500000  0.03000000
 [49]  0.07000000 -0.01166667  0.03000000  0.03000000  0.23333333 -0.01166667
 [55]  0.19333333 -0.09333333 -0.09333333  0.11166667  0.35666667  0.11166667
 [61]  0.07000000  0.03000000 -0.13333333  0.07000000  0.19333333  0.23333333
 [67] -0.01166667  0.11166667  0.11166667  0.03000000  0.03000000  0.19333333
 [73]  0.11166667  0.19333333 -0.05166667 -0.09333333  0.07000000 -0.09333333
 [79] -0.05166667 -0.09333333 -0.13333333  0.03000000  0.27500000  0.11166667
 [85]  0.07000000 -0.09333333  0.11166667  0.03000000 -0.21500000  0.07000000
 [91]  0.19333333 -0.13333333  0.19333333 -0.05166667 -0.05166667 -0.13333333
 [97] -0.01166667 -0.25666667  0.19333333  0.15166667

infer

openintro::sex_discrimination|>
  infer::specify(decision ~ sex,
                 success = "promoted")|>
  infer::hypothesise(null = "independence")|>
  infer::generate(reps = 100,
                  type = "permute")|>
  infer::calculate(stat = "diff in props",
                   order = c("male","female"))
Response: decision (factor)
Explanatory: sex (factor)
Null Hypothesis: independence
# A tibble: 100 × 2
   replicate    stat
       <int>   <dbl>
 1         1 -0.0417
 2         2 -0.208 
 3         3 -0.0417
 4         4  0.0417
 5         5 -0.0417
 6         6  0.0417
 7         7 -0.0417
 8         8  0.125 
 9         9 -0.0417
10        10  0.0417
# ℹ 90 more rows

Both work fine

but what is convenient

tibble(
  base_diffs = map_dbl(c(1:1000), 
        get_diffs),
  infer_diffs = openintro::sex_discrimination|>
  infer::specify(decision ~ sex,
                 success = "promoted")|>
  infer::hypothesise(null = "independence")|>
  infer::generate(reps = 1000,
                  type = "permute")|>
  infer::calculate(stat = "diff in props",
                   order = c("male","female"))|>
    pull(stat)
) -> differences_in_props

differences_in_props
# A tibble: 1,000 × 2
   base_diffs infer_diffs
        <dbl>       <dbl>
 1    -0.0517     -0.0417
 2    -0.215      -0.0417
 3     0.315       0.292 
 4     0.193       0.0417
 5    -0.133      -0.125 
 6     0.0700     -0.125 
 7    -0.0517      0.0417
 8     0.0700     -0.125 
 9     0.112       0.125 
10    -0.133       0.0417
# ℹ 990 more rows

Both work fine

Get p-values

Base

map_dbl(
  c(1:1000),
  get_diffs
) >= 0.292 -> greater_equal_obs_stat

sum(greater_equal_obs_stat)/1000
[1] 0.009

infer

  openintro::sex_discrimination|>
  infer::specify(decision ~ sex,
                 success = "promoted")|>
  infer::hypothesise(null = "independence")|>
  infer::generate(reps = 1000,
                  type = "permute")|>
  infer::calculate(stat = "diff in props",
                   order = c("male","female"))|>
  infer::get_p_value(obs_stat = 0.292,
                     direction = "right")
# A tibble: 1 × 1
  p_value
    <dbl>
1   0.006

Exercise

Use both methods for the sex discrimination study, iterate several times. Show that both methods generate similar p-values.

Testing Framework

  • Define Null and Alternative Hypothesis
  • Construct Test Statistic
  • Compute p-value
  • Decide to reject the Null or fail to reject

Type I and Type II Errors

  • Type 1 Error: \(H_0\) is actually true but we erroneously reject it.
  • Type 2 Error: \(H_0\) is actually not true and we fail to reject it.
  • Power of Hypothesis test: probability of not making Type 2 error when \(H_a\) actually holds.

The Multiple Testing problem

Discuss the coin flip example from ISLR

Note

If we test hundreds of Null Hypothesis, and reject all with p-value 0.01, How many type 1 errors should be expected.

The Family-Wise Error Rate

The probablity of making at least one Type I error when testing m hypothesis.

Decision \(H_o\) is True \(H_o\) is False Total
Reject \(H_o\) V S R
Do Not Reject \(H_o\) U W m - R
Total \(m_o\) m-\(m_o\) m

\(FWER = Pr(V>=1)\)

The Family-Wise Error Rate

\[FWER(\alpha) = 1 - Pr(V = 0)\]

\[FWER(\alpha) = 1 - \prod_{j=1}^m (1-\alpha) = 1 - (1-\alpha)^m\]

The Family-Wise Error Rate

The Fund data

'data.frame':   50 obs. of  2000 variables:
 $ Manager1   : num  -3.34 3.76 12.97 -4.87 2.02 ...
 $ Manager2   : num  -4.17 12.53 -2.58 7.98 -5.37 ...
 $ Manager3   : num  9.389 3.403 -0.825 -4.027 -4.855 ...
 $ Manager4   : num  8.417 0.144 6.585 -4.732 10.594 ...
 $ Manager5   : num  0.998 -7.222 17.05 0.503 -6.892 ...
 $ Manager6   : num  7.1915 0.0677 1.8571 0.7402 9.8778 ...
 $ Manager7   : num  -10.77 -10.74 3.2 -28.97 1.43 ...
 $ Manager8   : num  4.07 -1.14 -7.98 4.68 9.84 ...
 $ Manager9   : num  1.575 -7.167 -1.214 -0.569 5.311 ...
 $ Manager10  : num  -0.799 4.779 2.338 -4.001 18.365 ...
 $ Manager11  : num  -8.829 -0.457 -10.853 -9.79 -3.576 ...
 $ Manager12  : num  2.757 3.703 5.726 7.952 0.325 ...
 $ Manager13  : num  -2.21 1.78 -1.33 -0.37 6.94 ...
 $ Manager14  : num  17.3 21.34 8.81 -5.36 19.09 ...
 $ Manager15  : num  2.82 -4.56 -8.56 4.73 -5.75 ...
 $ Manager16  : num  0.162 -6.011 -1.394 -14.178 -3.004 ...
 $ Manager17  : num  -0.518 -9.804 4.922 1.826 2.854 ...
 $ Manager18  : num  -5.05 0.731 7.633 -2.493 6.278 ...
 $ Manager19  : num  -2.35 3.38 -3.86 -4.76 -4.33 ...
 $ Manager20  : num  -8.88 -3.31 -9.69 9.41 -1.15 ...
 $ Manager21  : num  -0.621 3.753 -13.275 -4.449 -12.233 ...
 $ Manager22  : num  -0.989 3.994 -5.297 -10.722 -9.546 ...
 $ Manager23  : num  3.138 -2.519 -3.04 1.939 -0.689 ...
 $ Manager24  : num  -23.11 -9.07 3.98 -11.01 -10.34 ...
 $ Manager25  : num  7.71 -6.32 1.99 11.65 2.32 ...
 $ Manager26  : num  -1.23 -1.76 -2.53 4.89 1.92 ...
 $ Manager27  : num  -19.398 0.638 6.21 -1.795 -17.354 ...
 $ Manager28  : num  -1.8965 -2.755 9.4749 -7.8057 -0.0393 ...
 $ Manager29  : num  8.623 13.635 0.439 -5.011 3.697 ...
 $ Manager30  : num  5.549 4.239 12.66 -0.417 7.12 ...
 $ Manager31  : num  -3.61 3.35 14.82 2.97 1.88 ...
 $ Manager32  : num  -4.33 -11.25 -5.7 13.54 -2.39 ...
 $ Manager33  : num  -0.216 -4.071 -1.429 5.625 -6.485 ...
 $ Manager34  : num  3.17 -1.66 -1.72 1.1 -1.94 ...
 $ Manager35  : num  12.17 6.33 9.25 6.53 10.73 ...
 $ Manager36  : num  -1.167 -2.71 -1.845 0.338 -1.918 ...
 $ Manager37  : num  2.263 -1.988 0.361 1.516 -4.077 ...
 $ Manager38  : num  -5.07 1.84 -2.58 7.84 5.38 ...
 $ Manager39  : num  5.189 5.454 2.291 -0.847 -4.622 ...
 $ Manager40  : num  2.26 -3.83 -3.73 -5.55 -4.12 ...
 $ Manager41  : num  -13.49 3.51 -6.69 2.16 2.88 ...
 $ Manager42  : num  1.63 1.26 -2.59 6.19 -11.35 ...
 $ Manager43  : num  -0.603 1.975 -7.385 4.527 -1.358 ...
 $ Manager44  : num  8.949 -16.9698 -7.4223 -16.1439 -0.0706 ...
 $ Manager45  : num  -4.76 -3.88 -1.61 -6.66 -11.18 ...
 $ Manager46  : num  1.253 -0.489 -0.101 -0.74 -3.132 ...
 $ Manager47  : num  -3.88 -4.98 12.05 15.64 16.13 ...
 $ Manager48  : num  -2.94 -11.23 7.15 -12.89 -5.26 ...
 $ Manager49  : num  -7 -0.541 1.899 -3.422 -7.338 ...
 $ Manager50  : num  13.179 -0.495 9.227 -7.344 1.01 ...
 $ Manager51  : num  -1.364 -2.022 0.807 -1.783 -2.933 ...
 $ Manager52  : num  -4.58 -11.95 9.83 15.42 24.64 ...
 $ Manager53  : num  0.405 -0.1178 -0.0948 -1.4143 -2.5958 ...
 $ Manager54  : num  15.22 8.22 11.86 -3.07 16.02 ...
 $ Manager55  : num  -1.39 1.37 1.16 2.97 -2.09 ...
 $ Manager56  : num  8.39 -5.82 11.71 3.68 1.35 ...
 $ Manager57  : num  9.98 -12.02 4.31 28.43 13.25 ...
 $ Manager58  : num  12.5 2.373 -2.01 3.105 -0.846 ...
 $ Manager59  : num  -7.81 -3.29 2.4 1.64 -3.32 ...
 $ Manager60  : num  11.641 0.889 -5.322 -0.871 22.7 ...
 $ Manager61  : num  -14.79 -8.06 -2.34 2.98 3.64 ...
 $ Manager62  : num  -10.858 -0.878 -10.937 0.84 -13.339 ...
 $ Manager63  : num  -2.91 -3.54 -0.26 2.18 -2.23 ...
 $ Manager64  : num  -7.89 -10.9 11.52 8.85 -7.07 ...
 $ Manager65  : num  2.86 10.38 5.33 -1.1 3.73 ...
 $ Manager66  : num  -0.518 2.597 2.695 4.578 -0.952 ...
 $ Manager67  : num  -2.5425 -1.1018 -0.0331 2.645 1.4676 ...
 $ Manager68  : num  -2.349 0.296 7.415 14.348 -2.833 ...
 $ Manager69  : num  -9.062 -0.225 -8.309 -6.805 -2.778 ...
 $ Manager70  : num  12.86 7.57 -2.16 14.47 -7.63 ...
 $ Manager71  : num  0.68 2.65 5.08 1.73 -4.86 ...
 $ Manager72  : num  -4.17 -1.85 -5.81 -19.2 -14.14 ...
 $ Manager73  : num  0.0154 1.438 10.7116 5.3531 -5.9821 ...
 $ Manager74  : num  2.85 2.07 -7.73 5.29 -9.92 ...
 $ Manager75  : num  10.19 11.46 7.9 7.07 7 ...
 $ Manager76  : num  -2.763 5.101 1.383 -0.439 4.541 ...
 $ Manager77  : num  6.27 -1.46 6.25 16.86 -3.69 ...
 $ Manager78  : num  5.18 12.4 -7.1 4.36 6.09 ...
 $ Manager79  : num  4.648 -2.28 1.071 3.254 0.521 ...
 $ Manager80  : num  1.44 7.53 -16.5 1.38 3.86 ...
 $ Manager81  : num  8.25 -6.41 -1.02 -6.71 5.07 ...
 $ Manager82  : num  1.19 8.57 -1.67 5.95 -18.16 ...
 $ Manager83  : num  11.86 -7.89 12.85 -2.2 8.2 ...
 $ Manager84  : num  -19.62 -4.52 2.34 16.13 1.28 ...
 $ Manager85  : num  4.7742 -8.3646 0.0774 -1.2245 1.1547 ...
 $ Manager86  : num  11.32 1.9 6 -5.43 -1.7 ...
 $ Manager87  : num  -6.41 -4 -7.02 14.22 -6.59 ...
 $ Manager88  : num  -10.91 7.3 -19.51 -5.6 2.94 ...
 $ Manager89  : num  -4.84 -0.144 -9.604 2.514 -4.605 ...
 $ Manager90  : num  12.58 -6.34 20.08 4.06 5.93 ...
 $ Manager91  : num  -0.959 0.802 16.763 6.745 7.085 ...
 $ Manager92  : num  0.335 0.097 -0.268 0.885 1.061 ...
 $ Manager93  : num  -0.6587 -1.8218 1.1954 -0.0847 4.0122 ...
 $ Manager94  : num  -13.477 -6.263 -10.25 0.906 -6.278 ...
 $ Manager95  : num  -5.98 -21.23 -18.13 6.98 -2.21 ...
 $ Manager96  : num  -3.315 9.382 5.111 -5.099 0.673 ...
 $ Manager97  : num  1.753 2.697 -2.878 -4.72 0.147 ...
 $ Manager98  : num  -10.5 10.72 -5.41 -9.76 -4.94 ...
 $ Manager99  : num  -11.086 -15.652 1.891 -8.532 -0.665 ...
  [list output truncated]

Bonferroni Correction

\[ FWER = Pr(making at least 1 type 1 error)\]

Assume that \(A_j\) is an event where we make a type1 error.

\(FWER = Pr(\bigcup_{j=1}^m A_j)\)

\(FWER <= \sum_{j=1}^m Pr(A_j)\)

\(FWER(\alpha/m) <= m * \frac{\alpha}{m}\)

Exercise

On the Fund data run independent t test (\(\alpha = 0.05\))for each fund manager to see if average excess returns are nonzero. Then run again with Bonferroni correction and see how many are still non-zero.